Toward Reliable and Rapid Elasticity for Streaming Dataflows on Clouds
نویسندگان
چکیده
The pervasive availability of streaming data is driving interest in distributed Fast Data platforms for streaming applications. Such latency-sensitive applications need to respond to dynamism in the input rates and task behavior using scale-in and -out on elastic Cloud resources. Platforms like Apache Storm do not provide robust capabilities for responding to such dynamism and for rapid task migration across VMs. We propose several dataflow checkpoint and migration approaches that allow a running streaming dataflow to migrate, without any loss of in-flight messages or their internal tasks states, while reducing the time to recover and stabilize. We implement and evaluate these migration strategies on Apache Storm using micro and application dataflows for scaling in and out on up to 2− 21 Azure VMs. Our results show that we can migrate dataflows of large sizes within 50 sec, in comparison to Storm’s default approach that takes over 100 sec. We also find that our approaches stabilize the application much earlier and there is no failure and re-processing of messages.
منابع مشابه
A Relational Approach to Complex Dataflows
Clouds have become an attractive platform for highly scalable processing of Big Data, especially due to the concept of elasticity, which characterizes them. Several languages and systems for cloud-based data processing have been proposed in the past, with the most popular among them being based on MapReduce [7]. In this paper, we present Exareme, a system for elastic large-scale data processing...
متن کاملA Method to Reduce Effects of Packet Loss in Video Streaming Using Multiple Description Coding
Multiple description (MD) coding has evolved as a promising technique for promoting error resiliency of multimedia system in real-time application programs over error-prone communicational channels. Although multiple description lattice vector quantization (MDCLVQ) is an efficient method for transmitting reliable data in the context of potential error channels, this method doesn’t consider disc...
متن کاملTowards Elastic Stream Processing: Patterns and Infrastructure
Distributed, highly-parallel processing frameworks as Hadoop are deemed to be state-of-the-art for handling big data today. But they burden application developers with the task to manually implement program logic using lowlevel batch processing APIs. Thus, a movement can be observed that high-level languages are developed which allow to declaratively model dataflows that are automatically optim...
متن کاملOnline Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features
Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...
متن کاملRealizing a Self-Adaptive Network Architecture for HPC Clouds
Clouds offer significant advantages over traditional cluster computing architectures including ease of deployment, rapid elasticity, and an economically attractive pay-as-you-go business model. However, the effectiveness of cloud computing for HPC systems still remains questionable. When clouds are deployed on lossless interconnection networks, challenges related to load-balancing, low-overhead...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1712.00605 شماره
صفحات -
تاریخ انتشار 2017